Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks
نویسندگان
چکیده
Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problems with multiple conflicting objectives. This paper discusses the advantages gained from applying stochastic policies to multiobjective tasks and examines a particular form of stochastic policy known as a mixture policy. Two methods are proposed for deriving mixture policies for episodic multiobjective tasks from deterministic base policies found via scalarised reinforcement learning. It is shown that these approaches are an efficient means of identifying solutions which offer a superior match to the user’s preferences than can be achieved by methods based strictly on deterministic policies.
منابع مشابه
Quantile Reinforcement Learning
In reinforcement learning, the standard criterion to evaluate policies in a state is the expectation of (discounted) sum of rewards. However, this criterion may not always be suitable, we consider an alternative criterion based on the notion of quantiles. In the case of episodic reinforcement learning problems, we propose an algorithm based on stochastic approximation with two timescales. We ev...
متن کاملComposable Deep Reinforcement Learning for Robotic Manipulation
Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained usi...
متن کاملUniversity of London Imperial College of Science , Technology and Medicine Department of Computing Learning to Act Stochastically
This thesis examines reinforcement learning for stochastic control processes with single and multiple agents, where either the learning outcomes are stochastic policies or learning is perpetual and within the domain of stochastic policies. In this context, a policy is a strategy for processing environmental outputs (called observations) and subsequently generating a response or input-signal to ...
متن کاملApplying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning
In this paper, we investigate motor primitive learning with the Natural Actor-Critic approach. The Natural Actor-Critic consists out of actor updates which are achieved using natural stochastic policy gradients while the critic obtains the natural policy gradient by linear regression. We show that this architecture can be used to learn the “building blocks of movement generation”, called motor ...
متن کاملFlexible State-dependant Machine Scheduling Problems Using Reinforcement Learning
This paper presents a simulation-based optimization methodology called reinforcement learning (RL) and suggests a neural approach to approximate the values when the systems under study are complex and involve large-scale decision-making sequential tasks. Computer simulation based reinforcement learning (RL) methods of stochastic approximation have been proposed in recent years as viable alterna...
متن کامل